For more details, datasets, and analysis scripts, visit our GitHub webpage.
The QS World University Rankings are a globally recognized framework for evaluating higher education institutions. This project will analyze ranking trends from 2022 to 2024 to uncover patterns and determinants of university performance. The findings will serve as an empirical guide for stakeholders in the education sector.
QS World University Rankings The QS World University Rankings provide a comprehensive evaluation of over 1,000 higher education institutions globally. Sourced from Quacquarelli Symonds (QS), these rankings are recognized worldwide for their depth of research and breadth of data regarding university performance. The datasets for 2022, 2023, and 2024, accessible through the QS website, form the primary basis of our analysis. These tables offer detailed insights into various performance metrics such as academic reputation, employer reputation, faculty-student ratio, citations per faculty, international faculty, and international students scores. By analyzing these datasets, we aim to uncover trends, evaluate shifts in rankings, and identify the determinants of university performance across the specified years.
The QS ranking methodology utilizes several metrics to gauge university performance, each capturing a distinct aspect of university excellence:
Academic Reputation Score (40% weight): Derived from a global academic survey, this score reflects the perceived research quality and academic standing of an institution.
Employer Reputation Score (10% weight): Based on a survey of employers, this score indicates the employability and preparedness of graduates in the workforce.
Faculty Student Score (20% weight): This metric measures the faculty-to-student ratio, providing insight into the teaching and learning environment of the university.
Citations per Faculty Score (20% weight): A measure of research impact, this score is calculated based on the average citations per faculty member, indicating research influence and quality.
International Faculty Score (5% weight): This score assesses the diversity of the faculty by measuring the proportion of international faculty members at the institution.
International Students Score (5% weight): Similarly, this score evaluates the diversity of the student body by looking at the percentage of international students.
Overall Score: A composite score that combines all individual metrics, representing a summarized assessment of a university's overall ranking performance.
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
!git clone https://github.com/weike2001/ds
Cloning into 'ds'... remote: Enumerating objects: 29, done. remote: Counting objects: 100% (29/29), done. remote: Compressing objects: 100% (25/25), done. remote: Total 29 (delta 3), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (29/29), 921.93 KiB | 7.26 MiB/s, done. Resolving deltas: 100% (3/3), done.
import pandas as pd
# Set the paths to the Excel files in the cloned repository
file_path_2022 = '/content/ds/data/2022_QS_World_University_Rankings_Results_public_version.xlsx'
file_path_2023 = '/content/ds/data/2023 QS World University Rankings V2.1 (For qs.com).xlsx'
file_path_2024 = '/content/ds/data/2024 QS World University Rankings 1.2 (For qs.com).xlsx'
# Read the data into pandas DataFrames
df_2022 = pd.read_excel(file_path_2022)
df_2023 = pd.read_excel(file_path_2023)
df_2024 = pd.read_excel(file_path_2024)
# Assuming you want to save these DataFrames as CSV files in the same directory
csv_file_path_2022 = file_path_2022.replace('.xlsx', '.csv')
csv_file_path_2023 = file_path_2023.replace('.xlsx', '.csv')
csv_file_path_2024 = file_path_2024.replace('.xlsx', '.csv')
# Save the DataFrames as CSV files
df_2022.to_csv(csv_file_path_2022, index=False)
df_2023.to_csv(csv_file_path_2023, index=False)
df_2024.to_csv(csv_file_path_2024, index=False)
# Now you can work with the DataFrames directly or the saved CSV files
# For example, you can print the head of the 2022 DataFrame
print(df_2022.head())
Unnamed: 0 Unnamed: 1 2011 Unnamed: 3 \
0 NATIONAL REGIONAL 2022 2021
1 RANK RANK RANK RANK
2 rank in country rank in subregion rank display rank display2
3 1 1 1 1
4 1 1 2 5
Unnamed: 4 Unnamed: 5 \
0 Institution Name Location
1 NaN CODE
2 institution country code
3 Massachusetts Institute of Technology (MIT) US
4 University of Oxford UK
Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 ... Unnamed: 15 \
0 NaN Classification NaN NaN ... NaN
1 COUNTRY / TERRITORY SIZE FOCUS RES. ... RANK
2 country size focus research ... er rank
3 United States M CO VH ... 4
4 United Kingdom L FC VH ... 3
Unnamed: 16 Unnamed: 17 Unnamed: 18 Unnamed: 19 \
0 Faculty Student NaN Citations per Faculty NaN
1 SCORE RANK SCORE RANK
2 fsr score fsr rank cpf score cpf rank
3 100 12 100 6
4 100 5 96 34
Unnamed: 20 Unnamed: 21 Unnamed: 22 Unnamed: 23 \
0 International Faculty NaN International Students NaN
1 SCORE RANK SCORE RANK
2 ifr score ifr rank isr score isr rank
3 100 45 91.4 105
4 99.5 83 98.5 52
Unnamed: 24
0 Overall
1 SCORE
2 score scaled
3 100
4 99.5
[5 rows x 25 columns]
Adjust columns in each csv form
import pandas as pd
# Define the new specific column names
specific_column_names_2022 = [
'National Rank', 'Regional Rank', '2022 Rank', '2021 Rank', 'Institution Name',
'Location Code', 'Country/Territory', 'Size', 'Focus', 'Research Intensity',
'Age Band', 'Status', 'Academic Reputation Score', 'Academic Reputation Rank',
'Employer Reputation Score', 'Employer Reputation Rank', 'Faculty Student Score',
'Faculty Student Rank', 'Citations per Faculty Score', 'Citations per Faculty Rank',
'International Faculty Score', 'International Faculty Rank', 'International Students Score',
'International Students Rank', 'Overall Score'
]
specific_column_names_2023 = [
'2023 Rank', '2022 Rank', 'Institution Name', 'Location Code', 'Country/Territory',
'Size', 'Focus', 'Research Intensity', 'Age Band', 'Status',
'Academic Reputation Score', 'Academic Reputation Rank',
'Employer Reputation Score', 'Employer Reputation Rank',
'Faculty Student Score', 'Faculty Student Rank',
'Citations per Faculty Score', 'Citations per Faculty Rank',
'International Faculty Score', 'International Faculty Rank',
'International Students Score', 'International Students Rank',
'International Research Network Score', 'International Research Network Rank',
'Employment Outcomes Score', 'Employment Outcomes Rank',
'Overall Score'
]
specific_column_names_2024 = [
'2024 Rank', '2023 Rank', 'Institution Name', 'Location Code', 'Country/Territory',
'Size', 'Focus', 'Research Intensity', 'Status',
'Academic Reputation Score', 'Academic Reputation Rank',
'Employer Reputation Score', 'Employer Reputation Rank',
'Faculty Student Score', 'Faculty Student Rank',
'Citations per Faculty Score', 'Citations per Faculty Rank',
'International Faculty Score', 'International Faculty Rank',
'International Students Score', 'International Students Rank',
'International Research Network Score', 'International Research Network Rank',
'Employment Outcomes Score', 'Employment Outcomes Rank',
'Sustainability Score', 'Sustainability Rank',
'Overall Score'
]
print(len(specific_column_names_2024))
# Reading the CSV files into Pandas DataFrames
df_2022 = pd.read_csv(csv_file_path_2022, skiprows = 4, names=specific_column_names_2022)
df_2023 = pd.read_csv(csv_file_path_2023, skiprows = 4, names=specific_column_names_2023)
df_2024 = pd.read_csv(csv_file_path_2024, skiprows = 4, names=specific_column_names_2024)
df_2022.head()
df_2023.head()
df_2024.head()
28
| 2024 Rank | 2023 Rank | Institution Name | Location Code | Country/Territory | Size | Focus | Research Intensity | Status | Academic Reputation Score | ... | International Faculty Rank | International Students Score | International Students Rank | International Research Network Score | International Research Network Rank | Employment Outcomes Score | Employment Outcomes Rank | Sustainability Score | Sustainability Rank | Overall Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | Massachusetts Institute of Technology (MIT) | US | United States | M | CO | VH | B | 100.0 | ... | 56 | 88.2 | 128 | 94.3 | 58 | 100.0 | 4 | 95.2 | 51 | 100 |
| 1 | 2 | 2 | University of Cambridge | UK | United Kingdom | L | FC | VH | A | 100.0 | ... | 64 | 95.8 | 85 | 99.9 | 7 | 100.0 | 6 | 97.3 | 33= | 99.2 |
| 2 | 3 | 4 | University of Oxford | UK | United Kingdom | L | FC | VH | A | 100.0 | ... | 110 | 98.2 | 60 | 100.0 | 1 | 100.0 | 3 | 97.8 | 26= | 98.9 |
| 3 | 4 | 5 | Harvard University | US | United States | L | FC | VH | B | 100.0 | ... | 210 | 66.8 | 223 | 100.0 | 5 | 100.0 | 1 | 96.7 | 39 | 98.3 |
| 4 | 5 | 3 | Stanford University | US | United States | L | FC | VH | B | 100.0 | ... | 78 | 51.2 | 284 | 95.8 | 44 | 100.0 | 2 | 94.4 | 63 | 98.1 |
5 rows × 28 columns
In this section, we focus on preparing the 'Overall Score' data from the QS World University Rankings for 2022, 2023, and 2024. The preparation involves two key steps:
NaN (Not a Number) to standardize the dataset for numerical analysis.Objectives:
This data preparation is essential for analyzing global university ranking trends and setting the stage for further in-depth examination of university performances.
import pandas as pd
import numpy as np
# Replace hyphens with NaN and convert the column to numeric
df_2022['Overall Score'] = pd.to_numeric(df_2022['Overall Score'].replace('-', np.nan), errors='coerce')
df_2023['Overall Score'] = pd.to_numeric(df_2023['Overall Score'].replace('-', np.nan), errors='coerce')
df_2024['Overall Score'] = pd.to_numeric(df_2024['Overall Score'].replace('-', np.nan), errors='coerce')
# Now, 'Overall Score' will be a float column with NaNs where there were hyphens - .
In our analysis of the QS World University Rankings datasets spanning 2022 to 2024, we direct our attention to a curated selection of metrics that significantly influence a university's prestige and global ranking. The evaluation encompasses:
For these pivotal metrics, we compute the mean, standard deviation, median, minimum, and maximum values to provide a distilled overview of university performance. This analysis will shed light on the average achievements, consistency, and range within these critical areas, offering stakeholders a succinct and strategic insight into the dynamics shaping university rankings.
import pandas as pd
# Assuming df_2022, df_2023, and df_2024 have already been loaded
# Selected metrics to compute summary statistics
selected_metrics = [
'Academic Reputation Score', 'Employer Reputation Score',
'Citations per Faculty Score', 'International Faculty Score',
'International Students Score', 'Overall Score'
]
# Function to calculate and print selected descriptive statistics
def print_selected_statistics(df, year, metrics):
print(f"Selected Descriptive Statistics for {year}:")
stats = df[metrics].describe().loc[['mean', 'std', 'min', '50%', 'max']]
print(stats, "\n") # Prints the mean, standard deviation, median, min, and max
# Call the function for each year's DataFrame
print_selected_statistics(df_2022, "2022", selected_metrics)
print_selected_statistics(df_2023, "2023", selected_metrics)
print_selected_statistics(df_2024, "2024", selected_metrics)
Selected Descriptive Statistics for 2022:
Academic Reputation Score Employer Reputation Score \
mean 21.552462 22.193000
std 23.315627 24.535947
min 1.000000 1.000000
50% 11.900000 11.950000
max 100.000000 100.000000
Citations per Faculty Score International Faculty Score \
mean 26.293308 26.503746
std 28.299027 35.429502
min 1.000000 1.000000
50% 13.400000 5.400000
max 100.000000 100.000000
International Students Score Overall Score
mean 28.119059 44.767066
std 31.211629 18.961269
min 1.000000 24.100000
50% 13.200000 38.600000
max 100.000000 100.000000
Selected Descriptive Statistics for 2023:
Academic Reputation Score Employer Reputation Score \
mean 20.124684 20.657143
std 22.802706 24.027928
min 1.000000 1.000000
50% 10.800000 10.300000
max 100.000000 100.000000
Citations per Faculty Score International Faculty Score \
mean 24.529358 31.659517
std 27.910952 34.170817
min 1.000000 1.000000
50% 11.100000 13.750000
max 100.000000 100.000000
International Students Score Overall Score
mean 26.545348 44.619400
std 30.896854 18.655057
min 1.000000 24.200000
50% 10.800000 38.550000
max 100.000000 100.000000
Selected Descriptive Statistics for 2024:
Academic Reputation Score Employer Reputation Score \
mean 20.132043 19.806880
std 22.365895 23.764625
min 1.600000 1.000000
50% 10.900000 9.500000
max 100.000000 100.000000
Citations per Faculty Score International Faculty Score \
mean 23.940163 30.948834
std 28.075573 34.247562
min 1.000000 1.000000
50% 10.400000 13.050000
max 100.000000 100.000000
International Students Score Overall Score
mean 25.575035 40.879900
std 30.867149 19.181335
min 1.000000 19.800000
50% 9.850000 34.550000
max 100.000000 100.000000
This section is dedicated to a comprehensive examination of the QS World University Rankings' metrics. We aim to dissect each component of the ranking system to provide an intricate understanding of how universities are evaluated and ranked on the global stage.
The QS ranking framework employs a set of multifaceted metrics, each designed to quantify distinct aspects of university performance. These metrics are:
The Overall Score represents a consolidated assessment derived from these individual metrics, dictating the university's ranking.
Through this deep dive into the QS ranking metrics, we seek to elucidate the nuances that underpin university rankings, providing a clear guide for institutions aiming to enhance their global standing.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Assuming the dataframes df_2022, df_2023, and df_2024 have already been loaded
# Define the metrics and their weights according to QS methodology
qs_metrics_weights = {
'Academic Reputation Score': 0.40,
'Employer Reputation Score': 0.10,
'Faculty Student Score': 0.20,
'Citations per Faculty Score': 0.20,
'International Faculty Score': 0.05,
'International Students Score': 0.05
}
# Function to analyze and plot each metric
def analyze_qs_metrics(df, year):
print(f"Analyzing QS Ranking Metrics for {year}")
for metric, weight in qs_metrics_weights.items():
df[metric] = pd.to_numeric(df[metric], errors='coerce') # Ensure the metric is numeric
# Plot the distribution of each metric
plt.figure(figsize=(10, 6))
sns.histplot(df[metric].dropna(), kde=True)
plt.title(f"Distribution of {metric} in {year}")
plt.xlabel(metric)
plt.ylabel('Frequency')
plt.show()
# Print the weight of the metric
print(f"{metric} has a weight of {weight*100}% in the overall ranking.\n")
# Analyze metrics for each year
analyze_qs_metrics(df_2022, '2022')
analyze_qs_metrics(df_2023, '2023')
analyze_qs_metrics(df_2024, '2024')
Analyzing QS Ranking Metrics for 2022
Academic Reputation Score has a weight of 40.0% in the overall ranking.
Employer Reputation Score has a weight of 10.0% in the overall ranking.
Faculty Student Score has a weight of 20.0% in the overall ranking.
Citations per Faculty Score has a weight of 20.0% in the overall ranking.
International Faculty Score has a weight of 5.0% in the overall ranking.
International Students Score has a weight of 5.0% in the overall ranking. Analyzing QS Ranking Metrics for 2023
Academic Reputation Score has a weight of 40.0% in the overall ranking.
Employer Reputation Score has a weight of 10.0% in the overall ranking.
Faculty Student Score has a weight of 20.0% in the overall ranking.
Citations per Faculty Score has a weight of 20.0% in the overall ranking.
International Faculty Score has a weight of 5.0% in the overall ranking.
International Students Score has a weight of 5.0% in the overall ranking. Analyzing QS Ranking Metrics for 2024
Academic Reputation Score has a weight of 40.0% in the overall ranking.
Employer Reputation Score has a weight of 10.0% in the overall ranking.
Faculty Student Score has a weight of 20.0% in the overall ranking.
Citations per Faculty Score has a weight of 20.0% in the overall ranking.
International Faculty Score has a weight of 5.0% in the overall ranking.
International Students Score has a weight of 5.0% in the overall ranking.
To gain a deeper understanding of the global landscape of higher education as reflected in the QS World University Rankings, we employ choropleth maps to visualize the distribution of ranked universities by country for the years 2022, 2023, and 2024. This geographic analysis allows us to observe trends, patterns, and potentially the regional dynamics influencing higher education excellence on a global scale.
The function create_choropleth_map is crafted to:
Here's a brief overview of the function and its application:
def configure_plotly_browser_state():
import IPython
display(IPython.core.display.HTML('''
<script src="/static/components/requirejs/require.js"></script>
<script>
requirejs.config({
paths: {
base: '/static/base',
plotly: 'https://cdn.plot.ly/plotly-latest.min.js?noext',
},
});
</script>
'''))
def enable_plotly_in_cell():
import IPython
from plotly.offline import init_notebook_mode
display(IPython.core.display.HTML('''<script src="/static/components/requirejs/require.js"></script>'''))
init_notebook_mode(connected=False)
import pandas as pd
import plotly.express as px
import plotly
#enable_plotly_in_cell()
def create_choropleth_map(dataframe, column_name, title):
# Generate a dictionary of value counts for the specified column
sample_data = dataframe[column_name].value_counts().to_dict()
# Convert the dictionary into a DataFrame
df_counts = pd.DataFrame(list(sample_data.items()), columns=['Country', 'University_Count'])
#print(df_counts)
# Create the choropleth map
fig = px.choropleth(df_counts,
locations="Country",
locationmode='country names',
color="University_Count",
color_continuous_scale=px.colors.sequential.Reds, # Reds color scale
title=title)
# Update the layout
fig.update_layout(
geo=dict(
showframe=False,
showcoastlines=False,
projection_type='equirectangular'
)
)
#configure_plotly_browser_state()
# Show the figure
fig.show(renderer="notebook")
# Use the function with your DataFrame and column
create_choropleth_map(df_2022, 'Country/Territory', 'Number of Universities per Country in 2022')
create_choropleth_map(df_2023, 'Country/Territory', 'Number of Universities per Country in 2023')
create_choropleth_map(df_2024, 'Country/Territory', 'Number of Universities per Country in 2024')
Our excel files come from links below: